| USN | 5 |  |  |  |       |
|-----|---|--|--|--|-------|
|     |   |  |  |  | 12.00 |

10CS74

## Seventh Semester B.E. Degree Examination, Feb./Mar. 2022 Advanced Computer Architectures

Time: 3 hrs.

Max. Marks:100

Note: Answer any FIVE full questions, selecting at least TWO full questions from each part.

## PART - A

1 a. Illustrate the seven dimensions of an Instruction set Architecture.

(07 Marks)

b. Briefly explain Amdahls law.

(05 Marks)

c. Explain two main measures of dependability.

(04 Marks)

- d. Program runs in 10 seconds on computer 'A' with 400MHZ clock, substantial increase in the clock speed possible, but would cause computer 'B' to require 1.2 times as many clock cycles as computer 'A'. What should be the clock rate of computer 'B'? For computer 'B' to run the program in 6 seconds.

  (04 Marks)
- 2 a. What is pipelining? With a neat diagram, explain how an instruction can be executed in 4 or 5 clock cycles in MIPS data path, without pipeline register. (10 Marks)
  - b. Explain delayed branch scheme to reduce pipeline penalties.

(05 Marks)

c. Explain the basics of RISC instruction set.

(05 Marks)

3 a. Explain the different types of data hazards with example.

(06 Marks)

- b. With a neat diagram give the basic structure of floating point unit using Tomasulo's algorithm and explain the steps an instruction goes through in the approach. (09 Marks)
- c. Explain the dynamic 2-Bit branch prediction state diagram.

(05 Marks)

- 4 a. What is Branch Target Buffer? With a neat diagram, explain the steps when using branch target buffer for a simple five stage pipeline. (10 Marks)
  - b. Suppose we have a VLIW that could issue two memory reference, two FP operations, and one integer operation or branch in every clock cycle show an unrolled version of the loop X[i] = X[i] + S for such a processor. Unroll as many times as necessary to eliminate any stalls ignore delayed branches: consider the MIPS code as follows:

LOOP: L.D

F0, O(R1)

ADD.D F4, F0, F2

S.D F4, 0(R1)

DADDUI R1, R1,  $\neq$  -8

BNE R1, R2, LOOP

(05 Marks)

c. Write a note on value predictor.

(05 Marks)

## PART - B

a. Explain taxonomy of parallel architecture according to Flynn.

(05 Marks)

- b. Explain directory based cache coherence for a distributed memory multiprocessor system along with the state transition diagram. (10 Marks)
- Explain any one hardware primitive to implement synchronization.

(05 Marks)

- 6 a. Discuss any two technique for reducing cache miss penalty. (08 Marks)
  - b. Assume we have a computer where CPI is 1.0, when all memory accesses hit in the cache. The only data accesses are loads and stores, and these total 50% of the instructions. If the miss penalty is 25 clock cycles and miss rate is 2%, how much faster would be computer if all instructions were cache hits?

    (08 Marks)
  - c. Write a short note on translation buffer technique of fast address translation. (04 Marks)
- 7 a. Which are the categories of advanced optimizations of cache performance? Explain any one in detail. (10 Marks)
  - b. Define memory access time and cycle time. Explain DRAM memory technology with its basic organization. (10 Marks)
- 8 a. Explain the architecture of IA-64 Intel processor. (08 Marks)
  - b. For the code given below: what are the dependencies between S1 and S2? Is this loop parallel? If not show how to make it parallel.

for (i = 1; i < 100; i = i + 1) {

A[i] = A[i] + B[i], /\* S1 \*/

B[i+1] = C[i] + D[i]; /\* S2

(08 Marks)

c. Write a brief note on predicated instructions.

(04 Marks)